Pattern Mining Across Domain-Specific Text Collections

نویسندگان

  • Lee Gillam
  • Khurshid Ahmad
چکیده

This paper discusses a consistency in patterns of language use across domain-specific collections of text. We present a method for the automatic identification of domain-specific keywords – specialist terms – based on comparing language use in scientific domain-specific text collections with language use in texts intended for a more general audience. The method supports automatic production of collocational networks, and of networks of concepts – thesauri, or so-called ontologies. The method involves a novel combination of existing metrics from work in computational linguistics, which can enable extraction, or learning, of these kinds of networks. Creation of ontologies or thesauri is informed by international (ISO) standards in terminology science, and the resulting resource can be used to support a variety of work, including data-mining applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining

Data Mining provides approaches for the identification and discovery of non-trivial patterns and models hidden in large collections of data. In the applied natural language processing domain, data mining usually requires preprocessed data that has been extracted from textual documents. Additionally, this data is often integrated with other data sources. This chapter provides an overview on data...

متن کامل

Characterizing Regulatory Documents and Guidelines Based on Text Mining

Implementing rules, constraints, and requirements contained in regulatory documents such as standards or guidelines constitutes a mandatory task for organizations and institutions across several domains. Due to the amount of domain-specific information and actions encoded in these documents, organizations often need to establish cooperations between several departments and consulting experts to...

متن کامل

Systematic text-mining approach for deriving aspects and patterns from domain knowledge

As the theoretical underpinnings of aspect-orientation mature, its application across the software lifecycle has expanded. An active area of research focuses on the application of aspect oriented techniques to unstructured or semi-structured requirements documents. In this context, primary issues involve the identification of early aspects and various forms of aspectual manipulation (e.g., weav...

متن کامل

Cross-Domain Mining of Argumentative Text through Distant Supervision

Argumentation mining is considered as a key technology for future search engines and automated decision making. In such applications, argumentative text segments have to be mined from large and diverse document collections. However, most existing argumentation mining approaches tackle the classification of argumentativeness only for a few manually annotated documents from narrow domains and reg...

متن کامل

Text Data Mining with Optimized Pattern Discovery

This paper describes an application of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of nding the patterns that optimizes a given statistical measure in a large collection of unstructured texts. For this class of patterns, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005